Identifying Top Five Business Establishment Hotspots in Melbourne
Authored by: Jnaneshwari Beerappa
Duration: {90} mins
Level: {Intermediate}
Pre-requisite Skills: Python, Data Analysis, Visualization, Basic Knowledge of Geospatial Data
Scenario

Scenario User Story: As a city planner for Melbourne, I need to identify the top five hotspots for business establishments within the city so that I can better allocate resources, plan infrastructure, and support local economic development.

  • Economic Growth: Target areas with high potential for business development.
  • Data Utilization: Leverage business establishment data for informed decision-making.
What this use case will teach you

At the end of this use case, we will be able to :

  • To Understand how to process and analyze geospatial data

Learn how to use Python libraries like Pandas, Geopandas, and Matplotlib for data analysis and visualization Be able to identify key business hotspots using clustering techniques Gain experience in presenting data-driven insights for urban planning

Introduction to the Problem

In this use case, I focused on identifying the top five hotspots for business establishments in Melbourne. The rationale behind solving this problem is to help city planners and stakeholders make informed decisions about where to focus resources for infrastructure development, public services, and economic support. By analyzing the distribution of business establishments across the city, we can uncover patterns and trends that are crucial for effective urban planning.

The dataset used for this analysis includes business establishment records for Melbourne, which contains information such as location coordinates, business types, and establishment density. The data was sourced from the City of Melbourne's open data portal and further cleaned and processed to ensure accuracy and relevance for this analysis.

In [ ]:
import requests
import pandas as pd
from io import StringIO

# Function to collect data
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    #apikey = "" # use if datasets require API key permissions
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
        # 'api_key': apikey
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')

# Set dataset_id to query for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Save dataset to df variable
df = collect_data(dataset_id)

# Check number of records in df
print(f'The dataset contains {len(df)} records.')

# View df
df.head(5)
The dataset contains 374210 records.
Out[ ]:
census_year block_id property_id base_property_id clue_small_area trading_name business_address industry_anzsic4_code industry_anzsic4_description longitude latitude
0 2003 105 100172 100172 Melbourne (CBD) Wilson Parking Australia 24-46 A'Beckett Street MELBOURNE 3000 9533 Parking Services 144.962053 -37.808573
1 2003 105 103301 103301 Melbourne (CBD) Melbourne International Backpackers 442-450 Elizabeth Street MELBOURNE 3000 4400 Accommodation 144.960868 -37.808309
2 2003 105 103302 103302 Melbourne (CBD) Vacant 422-440 Elizabeth Street MELBOURNE 3000 0 Vacant Space 144.961017 -37.808630
3 2003 105 103302 103302 Melbourne (CBD) The Garden Cafe Shop 3, Ground , 422-440 Elizabeth Street MELB... 4511 Cafes and Restaurants 144.961017 -37.808630
4 2003 105 103302 103302 Melbourne (CBD) Telephony Australia Shop 5, Ground , 422-440 Elizabeth Street MELB... 5809 Other Telecommunications Services 144.961017 -37.808630

Step 3: Check for Missing Values in Latitude and Longitude Columns

Below is a Python script that checks for missing values in the latitude and longitude columns of the dataset:
In [ ]:
# Check for missing values in latitude and longitude columns
missing_latitude = df['latitude'].isnull().sum()
missing_longitude = df['longitude'].isnull().sum()

print(f"Missing values in 'latitude': {missing_latitude}")
print(f"Missing values in 'longitude': {missing_longitude}")
Missing values in 'latitude': 4785
Missing values in 'longitude': 4785

Step 4: Clean the Dataset by Dropping Rows with Missing Values

In [ ]:
# Drop rows with missing latitude or longitude values
df_cleaned = df.dropna(subset=['latitude', 'longitude'])

# Confirm that there are no missing values left
missing_latitude_cleaned = df_cleaned['latitude'].isnull().sum()
missing_longitude_cleaned = df_cleaned['longitude'].isnull().sum()

print(f"Missing values after cleaning in 'latitude': {missing_latitude_cleaned}")
print(f"Missing values after cleaning in 'longitude': {missing_longitude_cleaned}")
Missing values after cleaning in 'latitude': 0
Missing values after cleaning in 'longitude': 0

Step 5: Ensure Latitude and Longitude Columns are Numeric

In [ ]:
# Ensure the latitude and longitude are numeric using .loc to avoid SettingWithCopyWarning
df_cleaned.loc[:, 'latitude'] = pd.to_numeric(df_cleaned['latitude'], errors='coerce')
df_cleaned.loc[:, 'longitude'] = pd.to_numeric(df_cleaned['longitude'], errors='coerce')

# Check the data types to confirm
print(df_cleaned[['latitude', 'longitude']].dtypes)
latitude     float64
longitude    float64
dtype: object

Step 6: Apply K-Means Clustering to Identify Business Hotspots

In [ ]:
from sklearn.cluster import KMeans

# Extract the latitude and longitude columns for clustering
locations = df_cleaned[['latitude', 'longitude']]

# Apply K-Means clustering to identify 5 clusters (hotspots)
kmeans = KMeans(n_clusters=5, random_state=0)
df_cleaned['cluster'] = kmeans.fit_predict(locations)

# Count the number of businesses in each cluster
cluster_counts = df_cleaned['cluster'].value_counts().sort_values(ascending=False)

# Display the top 5 clusters
print(cluster_counts.head(5))
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
cluster
0    156255
4    124877
3     56994
1     19448
2     11851
Name: count, dtype: int64
<ipython-input-5-8a629c4bb00c>:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cleaned['cluster'] = kmeans.fit_predict(locations)

Step 7: Retrieve Centroid Coordinates of Top 5 Hotspots

In [ ]:
# Get the centroid coordinates of the top 5 clusters
top_clusters = cluster_counts.index[:5]
centroids = kmeans.cluster_centers_[top_clusters]

# Display the centroids of the top 5 hotspots
for i, centroid in enumerate(centroids):
    print(f"Hotspot {i+1}: Latitude {centroid[0]}, Longitude {centroid[1]}")
Hotspot 1: Latitude -37.81649338709314, Longitude 144.9593746816832
Hotspot 2: Latitude -37.81027385193321, Longitude 144.96895152469776
Hotspot 3: Latitude -37.80272317246289, Longitude 144.9454911043396
Hotspot 4: Latitude -37.81482455911137, Longitude 144.91942891148435
Hotspot 5: Latitude -37.83819289766458, Longitude 144.97715040816652

Step 8: Visualize Hotspots on a Map Using Folium (Dynamic Centroid Extraction)

In [ ]:
import folium
from sklearn.cluster import KMeans

# Assuming you have already obtained the cleaned dataframe 'df_cleaned'
# Extract the latitude and longitude columns for clustering
locations = df_cleaned[['latitude', 'longitude']]

# Apply K-Means clustering to identify 5 clusters (hotspots)
kmeans = KMeans(n_clusters=5, random_state=0)
df_cleaned['cluster'] = kmeans.fit_predict(locations)

# Extract the centroid coordinates from the K-Means clustering result
centroids = kmeans.cluster_centers_

# Create a base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)

# Add markers for each hotspot centroid
for i, centroid in enumerate(centroids):
    folium.Marker(
        location=[centroid[0], centroid[1]],  # The latitude and longitude of the centroid
        popup=f"Hotspot {i+1}",  # Popup label for the marker
        icon=folium.Icon(color="red", icon="info-sign"),  # Custom icon for the marker
    ).add_to(melbourne_map)

# Save the map to an HTML file
melbourne_map.save("melbourne_hotspots_map.html")

# If running in a Jupyter notebook, you can display the map directly
melbourne_map
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
<ipython-input-8-0e453f57c009>:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cleaned['cluster'] = kmeans.fit_predict(locations)
Out[ ]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Objective: Analyze Existing Business Activity Based on Number of Businesses

The following analysis focuses on assessing the current business activity in Melbourne by evaluating the number of businesses operating in various sectors. This analysis will help identify which sectors have the highest concentration of businesses, providing insights into the city’s economic landscape.

Step 9: Data Inspection and Cleaning

In [ ]:
# Inspect column names
print(df.columns)

# Check for missing values
df.isnull().sum()

# Fill missing values or drop rows/columns as necessary
df = df.dropna()  # Example: Dropping rows with missing values

# Verify the data is clean
df.head()
Index(['census_year', 'block_id', 'property_id', 'base_property_id',
       'clue_small_area', 'trading_name', 'business_address',
       'industry_anzsic4_code', 'industry_anzsic4_description', 'longitude',
       'latitude'],
      dtype='object')
Out[ ]:
census_year block_id property_id base_property_id clue_small_area trading_name business_address industry_anzsic4_code industry_anzsic4_description longitude latitude
0 2003 105 100172 100172 Melbourne (CBD) Wilson Parking Australia 24-46 A'Beckett Street MELBOURNE 3000 9533 Parking Services 144.962053 -37.808573
1 2003 105 103301 103301 Melbourne (CBD) Melbourne International Backpackers 442-450 Elizabeth Street MELBOURNE 3000 4400 Accommodation 144.960868 -37.808309
2 2003 105 103302 103302 Melbourne (CBD) Vacant 422-440 Elizabeth Street MELBOURNE 3000 0 Vacant Space 144.961017 -37.808630
3 2003 105 103302 103302 Melbourne (CBD) The Garden Cafe Shop 3, Ground , 422-440 Elizabeth Street MELB... 4511 Cafes and Restaurants 144.961017 -37.808630
4 2003 105 103302 103302 Melbourne (CBD) Telephony Australia Shop 5, Ground , 422-440 Elizabeth Street MELB... 5809 Other Telecommunications Services 144.961017 -37.808630

Step 10: Analysis of Business Activity by Industry Classification

In [ ]:
# Analysis: Number of businesses by industry classification
industry_column = 'industry_anzsic4_description'
industry_counts = df[industry_column].value_counts()
print(industry_counts)
industry_anzsic4_description
Vacant Space                                                       56221
Cafes and Restaurants                                              29373
Legal Services                                                     13371
Takeaway Food Services                                             11185
Computer System Design and Related Services                         9456
                                                                   ...  
Leather Tanning, Fur Dressing and Leather Product Manufacturing        2
Other Basic Polymer Manufacturing                                      2
Veterinary Pharmaceutical and Medicinal Product Manufacturing          2
Beef Cattle Farming (Specialised)                                      1
Other Basic Chemical Product Manufacturing n.e.c.                      1
Name: count, Length: 441, dtype: int64

Step 13: Analyze Top 10 Business Types in Dynamic Hotspots

In [ ]:
import pandas as pd
from sklearn.cluster import KMeans

# Assuming df_cleaned is your cleaned DataFrame with all necessary columns

# Extract the latitude and longitude columns for clustering
locations = df_cleaned[['latitude', 'longitude']]

# Apply K-Means clustering to identify 5 clusters (hotspots)
kmeans = KMeans(n_clusters=5, random_state=0)
df_cleaned['cluster'] = kmeans.fit_predict(locations)

# Extract the centroid coordinates from the K-Means clustering result
centroids = kmeans.cluster_centers_

# Define a small radius to filter businesses around each centroid (e.g., 0.01 degrees)
radius = 0.01

# Filter businesses within the top 5 hotspots
hotspot_businesses = pd.DataFrame()

for centroid in centroids:
    lat, lon = centroid
    filtered_df = df_cleaned[
        (df_cleaned['latitude'].between(lat - radius, lat + radius)) &
        (df_cleaned['longitude'].between(lon - radius, lon + radius))
    ]
    hotspot_businesses = pd.concat([hotspot_businesses, filtered_df])

# Group by industry classification and count the number of businesses
industry_distribution = hotspot_businesses.groupby('industry_anzsic4_description').size()

# Sort by the number of businesses and get the top 10
top_10_industries = industry_distribution.sort_values(ascending=False).head(10)

# Display the top 10 business types
print("Top 10 Business Types for Establishment in Top 5 Hotspots:")
print(top_10_industries)
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
Top 10 Business Types for Establishment in Top 5 Hotspots:
industry_anzsic4_description
Vacant Space                                       61647
Cafes and Restaurants                              35938
Legal Services                                     20019
Takeaway Food Services                             14828
Other Auxiliary Finance and Investment Services    13054
Computer System Design and Related Services        12332
Management Advice and Other Consulting Services    11873
Hairdressing and Beauty Services                    8750
Clothing Retailing                                  8426
Womens Clothing Retailing                           7991
dtype: int64

Step 14: Visualize Top 10 Business Types with a Horizontal Bar Chart

In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt

# Plot the horizontal bar chart with color coding
plt.figure(figsize=(10, 8))
sns.barplot(
    x=top_10_industries.values,
    y=top_10_industries.index,
    palette='viridis'
)
plt.title('Top 10 Business Types for Establishment in Top 5 Hotspots')
plt.xlabel('Number of Businesses')
plt.ylabel('Industry Classification')
plt.show()
<ipython-input-13-85577e565a77>:6: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(
No description has been provided for this image

Step 15: Predicting Success Rate for New Businesses in Top 10 Industries within Top 5 Hotspots

This section describes how to develop a predictive model to estimate the success rate for starting a new business in one of the top 10 business activities within the top 5 hotspots identified in Melbourne.

1. Define Success Criteria

- Longevity-Based Success: A business could be considered successful if it has been operational for a certain number of years (e.g., 2 years or more).
- Revenue-Based Success: If revenue data is available, success can be defined by meeting a certain revenue threshold.

2. Feature Engineering

- Industry Classification: Include the business industry as a categorical feature.
- Location (Hotspot): Use the cluster or hotspot as a feature.
- Historical Success Data: Incorporate historical data about business longevity or success rates in specific industries and hotspots.

3. Model Selection

- Use a classification model such as Logistic Regression, Random Forest, or Gradient Boosting to predict whether a new business will be successful based on the above features.
- Features:
    - Industry classification (one of the top 10 industries).
    - Geographical location (hotspot cluster).
    - Other relevant features (e.g., initial investment if available).

4. Training the Model

- Split the Data: Divide your data into training and testing sets.
- Model Training: Train your classification model using historical business data.
- Evaluation: Evaluate the model on the test set to check accuracy, precision, recall, and other relevant metrics.

5. Prediction

- Input: New business data (e.g., industry classification, hotspot location).
- Output: Probability of success (e.g., 80% chance of being successful).

Example Workflow

In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Feature selection: industry classification and hotspot cluster
X = df_cleaned[['industry_anzsic4_description', 'cluster']]
X = pd.get_dummies(X, columns=['industry_anzsic4_description', 'cluster'])

# Target: success (0 or 1)
y = df_cleaned['success']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Predict success probability for a new business
new_business = {'industry_anzsic4_description': 'Cafes and Restaurants', 'cluster': 2}
new_business_df = pd.DataFrame([new_business])
new_business_df = pd.get_dummies(new_business_df).reindex(columns=X.columns, fill_value=0)

success_probability = model.predict_proba(new_business_df)[0][1]
print(f"Predicted success probability: {success_probability:.2f}")
Accuracy: 0.9857481220816133
              precision    recall  f1-score   support

           0       0.33      0.00      0.00      1052
           1       0.99      1.00      0.99     72833

    accuracy                           0.99     73885
   macro avg       0.66      0.50      0.50     73885
weighted avg       0.98      0.99      0.98     73885

Predicted success probability: 0.99
In [ ]:
!pip install dash
!pip install dash-bootstrap-components
!pip install pyngrok
Requirement already satisfied: dash in /usr/local/lib/python3.10/dist-packages (2.17.1)
Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash) (2.2.5)
Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash) (3.0.4)
Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.15.0)
Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0)
Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0)
Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.0.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash) (8.4.0)
Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash) (4.12.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash) (2.32.3)
Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash) (1.3.4)
Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash) (1.6.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash) (71.0.4)
Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (3.1.4)
Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (2.2.0)
Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (8.1.7)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (9.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (24.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash) (2.1.5)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash) (3.20.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.8)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2024.7.4)
Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash) (1.16.0)
Collecting dash-bootstrap-components
  Downloading dash_bootstrap_components-1.6.0-py3-none-any.whl.metadata (5.2 kB)
Requirement already satisfied: dash>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash-bootstrap-components) (2.17.1)
Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.2.5)
Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (3.0.4)
Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.15.0)
Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0)
Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0)
Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.0.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (8.4.0)
Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (4.12.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.32.3)
Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.3.4)
Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.6.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (71.0.4)
Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (3.1.4)
Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (2.2.0)
Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (8.1.7)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (9.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (24.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash>=2.0.0->dash-bootstrap-components) (2.1.5)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash>=2.0.0->dash-bootstrap-components) (3.20.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.8)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2024.7.4)
Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash>=2.0.0->dash-bootstrap-components) (1.16.0)
Downloading dash_bootstrap_components-1.6.0-py3-none-any.whl (222 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 222.5/222.5 kB 3.7 MB/s eta 0:00:00
Installing collected packages: dash-bootstrap-components
Successfully installed dash-bootstrap-components-1.6.0
Collecting pyngrok
  Downloading pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB)
Requirement already satisfied: PyYAML>=5.1 in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.2)
Downloading pyngrok-7.2.0-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.0
In [ ]:
import requests
import pandas as pd
from io import StringIO

# Function to collect data
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')
        return None

# Set dataset_id to query for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Save dataset to df variable
df = collect_data(dataset_id)

if df is not None:
    # Check number of records in df
    print(f'The dataset contains {len(df)} records.')

    # Save the DataFrame to a CSV file
    df.to_csv('melbourne_business_data.csv', index=False)
    print("Dataset saved as 'melbourne_business_data.csv'")

    # Optionally view the first few rows of the dataset
    print(df.head(5))
The dataset contains 374210 records.
Dataset saved as 'melbourne_business_data.csv'
   census_year  block_id  property_id  base_property_id clue_small_area  \
0         2017       266       109851            109851         Carlton   
1         2017       266       109851            109851         Carlton   
2         2017       266       534003            534003         Carlton   
3         2017       266       664003            664003         Carlton   
4         2017       266       664005            664005         Carlton   

                   trading_name                          business_address  \
0  Metropoli's Research Pty Ltd  Level 1, 74 Victoria Street CARLTON 3053   
1             J Hong Restaurant  Ground , 74 Victoria Street CARLTON 3053   
2                  St2 Expresso           70 Victoria Street CARLTON 3053   
3            RMIT Resources Ltd           20 Cardigan Street CARLTON 3053   
4                        vacant           24 Cardigan Street CARLTON 3053   

   industry_anzsic4_code              industry_anzsic4_description  \
0                   6950  Market Research and Statistical Services   
1                   4511                     Cafes and Restaurants   
2                   4512                    Takeaway Food Services   
3                   8102                          Higher Education   
4                      0                              Vacant Space   

    longitude   latitude  
0  144.965352 -37.806701  
1  144.965352 -37.806701  
2  144.965473 -37.806714  
3  144.964753 -37.806312  
4  144.964772 -37.806203  
In [ ]:
from google.colab import files
files.download('melbourne_business_data.csv')
In [ ]:
!pip install dash
!pip install dash-bootstrap-components
!pip install pyngrok
Requirement already satisfied: dash in /usr/local/lib/python3.10/dist-packages (2.17.1)
Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash) (2.2.5)
Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash) (3.0.4)
Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.15.0)
Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0)
Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0)
Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.0.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash) (8.4.0)
Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash) (4.12.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash) (2.32.3)
Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash) (1.3.4)
Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash) (1.6.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash) (71.0.4)
Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (3.1.4)
Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (2.2.0)
Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (8.1.7)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (9.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (24.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash) (2.1.5)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash) (3.20.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.8)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2024.7.4)
Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash) (1.16.0)
Requirement already satisfied: dash-bootstrap-components in /usr/local/lib/python3.10/dist-packages (1.6.0)
Requirement already satisfied: dash>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash-bootstrap-components) (2.17.1)
Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.2.5)
Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (3.0.4)
Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.15.0)
Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0)
Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0)
Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.0.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (8.4.0)
Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (4.12.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.32.3)
Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.3.4)
Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.6.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (71.0.4)
Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (3.1.4)
Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (2.2.0)
Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (8.1.7)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (9.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (24.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash>=2.0.0->dash-bootstrap-components) (2.1.5)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash>=2.0.0->dash-bootstrap-components) (3.20.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.8)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2024.7.4)
Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash>=2.0.0->dash-bootstrap-components) (1.16.0)
Requirement already satisfied: pyngrok in /usr/local/lib/python3.10/dist-packages (7.2.0)
Requirement already satisfied: PyYAML>=5.1 in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.2)
In [ ]:
from google.colab import files
uploaded = files.upload()
import pandas as pd

df_cleaned = pd.read_csv('/content/melbourne_business_data.csv')
print(df_cleaned.head())
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving melbourne_business_data.csv to melbourne_business_data (1).csv
   census_year  block_id  property_id  base_property_id clue_small_area  \
0         2017       266       109851            109851         Carlton   
1         2017       266       109851            109851         Carlton   
2         2017       266       534003            534003         Carlton   
3         2017       266       664003            664003         Carlton   
4         2017       266       664005            664005         Carlton   

                   trading_name                          business_address  \
0  Metropoli's Research Pty Ltd  Level 1, 74 Victoria Street CARLTON 3053   
1             J Hong Restaurant  Ground , 74 Victoria Street CARLTON 3053   
2                  St2 Expresso           70 Victoria Street CARLTON 3053   
3            RMIT Resources Ltd           20 Cardigan Street CARLTON 3053   
4                        vacant           24 Cardigan Street CARLTON 3053   

   industry_anzsic4_code              industry_anzsic4_description  \
0                   6950  Market Research and Statistical Services   
1                   4511                     Cafes and Restaurants   
2                   4512                    Takeaway Food Services   
3                   8102                          Higher Education   
4                      0                              Vacant Space   

    longitude   latitude  
0  144.965352 -37.806701  
1  144.965352 -37.806701  
2  144.965473 -37.806714  
3  144.964753 -37.806312  
4  144.964772 -37.806203  
In [ ]:
import folium
import pandas as pd
from folium.plugins import HeatMap

# Load your cleaned data
df_cleaned = pd.read_csv('/content/melbourne_business_data.csv')

# Filter the data to include only rows with valid latitude and longitude values
df_cleaned = df_cleaned.dropna(subset=['latitude', 'longitude'])

# Aggregate data by latitude and longitude for the heatmap
heatmap_data = df_cleaned[['latitude', 'longitude']].values.tolist()

# Create a base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=12)

# Add the HeatMap layer to the map
HeatMap(heatmap_data, radius=10).add_to(melbourne_map)

# Add markers for the prominent hotspots
hotspots = [
    {"name": "Hotspot 1", "lat": -37.81649338709314, "lon": 144.9593746816832},
    {"name": "Hotspot 2", "lat": -37.81027385193321, "lon": 144.96895152469776},
    {"name": "Hotspot 3", "lat": -37.80272317246289, "lon": 144.9454911043396},
    {"name": "Hotspot 4", "lat": -37.81482455911137, "lon": 144.91942891148435},
    {"name": "Hotspot 5", "lat": -37.83819289766458, "lon": 144.97715040816652},
]

# Add a marker for each hotspot
for hotspot in hotspots:
    folium.Marker(
        location=[hotspot["lat"], hotspot["lon"]],
        popup=hotspot["name"],
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(melbourne_map)

# Display the map
melbourne_map
Out[ ]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]:
import requests
import pandas as pd
from io import StringIO

# Function to collect data
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    #apikey = "" # use if datasets require API key permissions
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
        # 'api_key': apikey
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')

# Set dataset_id to query for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Save dataset to df variable
df = collect_data(dataset_id)

# Check number of records in df
print(f'The dataset contains {len(df)} records.')

# View df
df.head(5)
The dataset contains 374210 records.
Out[ ]:
census_year block_id property_id base_property_id clue_small_area trading_name business_address industry_anzsic4_code industry_anzsic4_description longitude latitude
0 2017 266 109851 109851 Carlton Metropoli's Research Pty Ltd Level 1, 74 Victoria Street CARLTON 3053 6950 Market Research and Statistical Services 144.965352 -37.806701
1 2017 266 109851 109851 Carlton J Hong Restaurant Ground , 74 Victoria Street CARLTON 3053 4511 Cafes and Restaurants 144.965352 -37.806701
2 2017 266 534003 534003 Carlton St2 Expresso 70 Victoria Street CARLTON 3053 4512 Takeaway Food Services 144.965473 -37.806714
3 2017 266 664003 664003 Carlton RMIT Resources Ltd 20 Cardigan Street CARLTON 3053 8102 Higher Education 144.964753 -37.806312
4 2017 266 664005 664005 Carlton vacant 24 Cardigan Street CARLTON 3053 0 Vacant Space 144.964772 -37.806203
In [ ]:
import requests
import pandas as pd
from io import StringIO

# Function to collect data
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # Fetch all records
        'lang': 'en',
        'timezone': 'UTC',
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # Read the CSV content
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')
        return None

# Set dataset_id for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Collect data and store in df
df = collect_data(dataset_id)

# Proceed if data was loaded successfully
if df is not None:
    print(f'The dataset contains {len(df)} records.')
    print(df.head(5))  # Display the first 5 rows to inspect the data
else:
    print("Failed to retrieve data.")
The dataset contains 374210 records.
   census_year  block_id  property_id  base_property_id clue_small_area  \
0         2017       266       109851            109851         Carlton   
1         2017       266       109851            109851         Carlton   
2         2017       266       534003            534003         Carlton   
3         2017       266       664003            664003         Carlton   
4         2017       266       664005            664005         Carlton   

                   trading_name                          business_address  \
0  Metropoli's Research Pty Ltd  Level 1, 74 Victoria Street CARLTON 3053   
1             J Hong Restaurant  Ground , 74 Victoria Street CARLTON 3053   
2                  St2 Expresso           70 Victoria Street CARLTON 3053   
3            RMIT Resources Ltd           20 Cardigan Street CARLTON 3053   
4                        vacant           24 Cardigan Street CARLTON 3053   

   industry_anzsic4_code              industry_anzsic4_description  \
0                   6950  Market Research and Statistical Services   
1                   4511                     Cafes and Restaurants   
2                   4512                    Takeaway Food Services   
3                   8102                          Higher Education   
4                      0                              Vacant Space   

    longitude   latitude  
0  144.965352 -37.806701  
1  144.965352 -37.806701  
2  144.965473 -37.806714  
3  144.964753 -37.806312  
4  144.964772 -37.806203  

1. Time-Series Analysis (Based on census_year)

Objective: Analyze how business activities have evolved over time.

Approach: Use the census_year column to track the growth or decline in business establishments in specific areas (clue_small_area).

What to do:

  • Count businesses over time: Group the data by census_year and analyze the number of businesses established or closed.
  • Trend analysis: Visualize trends to identify which areas (based on clue_small_area) have seen the most growth in business activity.
In [ ]:
import matplotlib.pyplot as plt

# Group data by census_year and count the number of businesses established each year
business_trend = df.groupby('census_year')['trading_name'].count().reset_index()

# Plot the trend over time
plt.figure(figsize=(10, 6))  # Adjust the figure size for better clarity
plt.plot(business_trend['census_year'], business_trend['trading_name'], marker='o', linestyle='-', color='b')

# Add title and labels
plt.title('Business Establishments Over Time', fontsize=14)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Number of Businesses', fontsize=12)

# Show the plot
plt.grid(True)  # Add gridlines for readability
plt.show()
No description has been provided for this image

3. Industry Analysis

Objective: Analyze the concentration of different industries (industry_anzsic4_description).

Approach: Use the industry_anzsic4_description column to see which industries are dominant in certain geographic areas or time periods.

What to do:

  • Top industries: Find the top industries by counting the occurrences of each industry type.
  • Industry hotspots: Combine geographic data with industry information to highlight specific areas where certain industries are more prevalent.
In [ ]:
import pandas as pd

# Assuming df is already loaded and contains the columns 'industry_anzsic4_description' and 'clue_small_area'

# 1. Top 5 industries by frequency in the dataset
top_industries = df['industry_anzsic4_description'].value_counts().head(5)
print("Top 5 Industries:\n", top_industries)

# 2. Industry hotspots: Grouping by area (clue_small_area) and industry to find prevalent industries in each area
industry_hotspots = df.groupby(['clue_small_area', 'industry_anzsic4_description']).size().reset_index(name='counts')

# Sorting to find the areas with the most prevalent industries
industry_hotspots_sorted = industry_hotspots.sort_values(by='counts', ascending=False)

# Display the top 10 industry hotspots
print("Top 10 Industry Hotspots:\n", industry_hotspots_sorted.head(10))

# Optional: Saving the industry hotspots to a CSV for further analysis
industry_hotspots_sorted.to_csv('industry_hotspots.csv', index=False)
Top 5 Industries:
 industry_anzsic4_description
Vacant Space                                   57392
Cafes and Restaurants                          29666
Legal Services                                 13415
Takeaway Food Services                         11304
Computer System Design and Related Services     9580
Name: count, dtype: int64
Top 10 Industry Hotspots:
       clue_small_area                     industry_anzsic4_description  counts
1232  Melbourne (CBD)                                     Vacant Space   27689
921   Melbourne (CBD)                            Cafes and Restaurants   17570
1038  Melbourne (CBD)                                   Legal Services   11702
1222  Melbourne (CBD)                           Takeaway Food Services    7440
1089  Melbourne (CBD)  Other Auxiliary Finance and Investment Services    7269
476         Docklands                                     Vacant Space    6793
1048  Melbourne (CBD)  Management Advice and Other Consulting Services    6490
944   Melbourne (CBD)      Computer System Design and Related Services    6077
1722  North Melbourne                                     Vacant Space    5521
1011  Melbourne (CBD)                 Hairdressing and Beauty Services    4330

Proximity Analysis (Distance to Competitors or Key Locations)

Objective: Analyze how proximity to competitors, transportation hubs, or key business districts affects business success.

Approach: Calculate the distance between businesses or specific geographic landmarks (e.g., train stations, shopping centers) and analyze the impact on business performance.

What to do:

  • Distance to key locations: Use the geopy library to calculate the distance between businesses and key landmarks.
  • Impact on success: Analyze how distance from these landmarks affects business success rates.
In [ ]:
import pandas as pd
from geopy.distance import geodesic

# List of key locations (e.g., transportation hubs and important landmarks)
key_locations = [
    (-37.8182711, 144.9670618),  # Flinders Street Station
    (-37.8077867, 144.9624251),  # Southern Cross Station
    (-37.814, 144.96332),        # Melbourne City Central
    (-37.81197, 144.97316),      # Parliament Station
    (-37.81149, 144.95436),      # Flagstaff Station
]

# Function to calculate the distance from a business to the nearest key location
def calculate_nearest_location(lat, lon, locations):
    distances = [geodesic((lat, lon), location).km for location in locations]
    return min(distances)

# Filter out rows with missing latitude or longitude
df_filtered = df.dropna(subset=['latitude', 'longitude'])

# Ensure that latitude and longitude columns are available in the filtered dataset
if 'latitude' in df_filtered.columns and 'longitude' in df_filtered.columns:
    # Calculate the distance to the nearest key location for each business
    df_filtered['distance_to_nearest_key_location'] = df_filtered.apply(
        lambda row: calculate_nearest_location(row['latitude'], row['longitude'], key_locations), axis=1
    )

    # Analyze proximity by geographic area or industry (or other metrics)
    proximity_analysis_by_area = df_filtered.groupby('clue_small_area')['distance_to_nearest_key_location'].mean().reset_index()
    proximity_analysis_by_industry = df_filtered.groupby('industry_anzsic4_description')['distance_to_nearest_key_location'].mean().reset_index()

    # Display results
    print("Proximity Analysis by Geographic Area:")
    print(proximity_analysis_by_area.head())

    print("\nProximity Analysis by Industry:")
    print(proximity_analysis_by_industry.head())
else:
    print("'latitude' and 'longitude' columns are missing from the dataset.")
Proximity Analysis by Geographic Area:
   clue_small_area  distance_to_nearest_key_location
0          Carlton                          0.857704
1        Docklands                          1.159365
2   East Melbourne                          0.833543
3       Kensington                          3.072435
4  Melbourne (CBD)                          0.339598

Proximity Analysis by Industry:
                  industry_anzsic4_description  \
0                                Accommodation   
1                          Accounting Services   
2  Adult, Community and Other Education n.e.c.   
3                         Advertising Services   
4               Aged Care Residential Services   

   distance_to_nearest_key_location  
0                          0.818983  
1                          0.633597  
2                          0.580261  
3                          1.015940  
4                          2.438420  
<ipython-input-9-deae4741e32f>:24: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered['distance_to_nearest_key_location'] = df_filtered.apply(

Business Location Analysis for Carlton, Flinders Street, and Chapel Street

User Story: As a business entrepreneur, I want to identify the most suitable location to start my business in Melbourne. I am particularly interested in understanding the business landscape and industry distribution near Carlton, Flinders Street, and Chapel Street. This analysis will help me determine which area offers the best opportunity based on proximity to key business districts and competitors.

Objective: Analyze the business environment within 3 km of Carlton, Flinders Street, and Chapel Street, focusing on industry distribution and business concentration.

Approach:

  • Use the geographic coordinates of Carlton, Flinders Street, and Chapel Street to calculate distances from each business in the dataset.
  • Filter businesses located within a 3 km radius of each location.
  • Analyze the distribution of industries in each area to identify the top business sectors.
  • Compare the business density and diversity across Carlton, Flinders Street, and Chapel Street to help the entrepreneur make an informed decision.

Steps:

  • Step 1: Extract business data for all locations in Melbourne using the Melbourne Open Data API.
  • Step 2: Define the geographic coordinates for Carlton, Flinders Street, and Chapel Street:
    • Carlton: Latitude: -37.800, Longitude: 144.966
    • Flinders Street: Latitude: -37.8182711, Longitude: 144.9670618
    • Chapel Street: Latitude: -37.8517106, Longitude: 144.9939362
  • Step 3: Calculate the distance from each business to the selected locations using the geodesic distance calculation (using the geopy library).
  • Step 4: Filter the businesses within a 3 km radius of each location (Carlton, Flinders Street, and Chapel Street).
  • Step 5: Analyze the industry distribution by grouping businesses by industry code and description.
  • Step 6: Visualize the industry distribution for each location, comparing the top 10 industries for businesses in Carlton, Flinders Street, and Chapel Street.
In [ ]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO

# Step 1: Function to collect data from the Melbourne Open Data portal
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # Fetch all records
        'lang': 'en',
        'timezone': 'UTC',
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # Read the CSV content
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')
        return None

# Step 2: Set dataset_id for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Step 3: Collect data and store in df
df = collect_data(dataset_id)

# Step 4: Proceed if data was loaded successfully
if df is not None:
    print(f'The dataset contains {len(df)} records.')
    print(df.head(5))  # Display the first 5 rows to inspect the data
else:
    print("Failed to retrieve data.")

# Step 5: Group by area and industry for distribution analysis
if 'clue_small_area' in df.columns and 'industry_anzsic4_description' in df.columns:
    size_distribution = df.groupby(['clue_small_area', 'industry_anzsic4_description']).size().reset_index(name='business_count')

    # Display the distribution
    print("Business Distribution by Area and Industry:\n", size_distribution)

    # Step 6: Visualization - Horizontal bar chart for a specific area
    plt.figure(figsize=(10, 6))
    area = 'Carlton'  # Example area
    area_data = size_distribution[size_distribution['clue_small_area'] == area]

    # Sort data to display the top industries first (optional)
    area_data = area_data.sort_values(by='business_count', ascending=False).head(10)  # Top 10 industries

    # Create a horizontal bar chart
    plt.barh(area_data['industry_anzsic4_description'], area_data['business_count'], color='skyblue')

    # Customize the plot
    plt.title(f'Business Distribution by Industry in {area}')
    plt.xlabel('Number of Businesses')
    plt.ylabel('Industry')
    plt.xticks(rotation=0)  # Ensure the x-axis labels are horizontal for readability

    # Show the chart
    plt.show()

else:
    print("Required columns ('clue_small_area' or 'industry_anzsic4_description') are missing.")
The dataset contains 374210 records.
   census_year  block_id  property_id  base_property_id clue_small_area  \
0         2017       266       109851            109851         Carlton   
1         2017       266       109851            109851         Carlton   
2         2017       266       534003            534003         Carlton   
3         2017       266       664003            664003         Carlton   
4         2017       266       664005            664005         Carlton   

                   trading_name                          business_address  \
0  Metropoli's Research Pty Ltd  Level 1, 74 Victoria Street CARLTON 3053   
1             J Hong Restaurant  Ground , 74 Victoria Street CARLTON 3053   
2                  St2 Expresso           70 Victoria Street CARLTON 3053   
3            RMIT Resources Ltd           20 Cardigan Street CARLTON 3053   
4                        vacant           24 Cardigan Street CARLTON 3053   

   industry_anzsic4_code              industry_anzsic4_description  \
0                   6950  Market Research and Statistical Services   
1                   4511                     Cafes and Restaurants   
2                   4512                    Takeaway Food Services   
3                   8102                          Higher Education   
4                      0                              Vacant Space   

    longitude   latitude  
0  144.965352 -37.806701  
1  144.965352 -37.806701  
2  144.965473 -37.806714  
3  144.964753 -37.806312  
4  144.964772 -37.806203  
Business Distribution by Area and Industry:
                    clue_small_area  \
0                          Carlton   
1                          Carlton   
2                          Carlton   
3                          Carlton   
4                          Carlton   
...                            ...   
2749  West Melbourne (Residential)   
2750  West Melbourne (Residential)   
2751  West Melbourne (Residential)   
2752  West Melbourne (Residential)   
2753  West Melbourne (Residential)   

                           industry_anzsic4_description  business_count  
0                                         Accommodation            1015  
1                                   Accounting Services             597  
2           Adult, Community and Other Education n.e.c.              83  
3                                  Advertising Services              75  
4                        Aged Care Residential Services               9  
...                                                 ...             ...  
2749         Wired Telecommunications Network Operation              25  
2750                          Womens Clothing Retailing               9  
2751  Wooden Furniture and Upholstered Seat Manufact...              10  
2752  Wooden Structural Fitting and Component Manufa...               2  
2753                                   Wool Wholesaling               3  

[2754 rows x 3 columns]
No description has been provided for this image
In [ ]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
from geopy.distance import geodesic

# Step 1: Function to collect data from the Melbourne Open Data portal
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # Fetch all records
        'lang': 'en',
        'timezone': 'UTC',
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # Read the CSV content
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')
        return None

# Step 2: Set dataset_id for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Step 3: Collect data and store in df
df = collect_data(dataset_id)

# Step 4: Proceed if data was loaded successfully
if df is not None:
    print(f'The dataset contains {len(df)} records.')
    print(df.head(5))  # Display the first 5 rows to inspect the data
else:
    print("Failed to retrieve data.")

# Step 5: Define the Flinders Street coordinates (hard-coded)
flinders_street_coords = (-37.8182711, 144.9670618)  # Latitude and Longitude of Flinders Street

# Step 6: Function to calculate the distance from Flinders Street
def calculate_distance(row):
    business_coords = (row['latitude'], row['longitude'])
    return geodesic(business_coords, flinders_street_coords).km

# Step 7: Drop rows with missing latitude or longitude
df_cleaned = df.dropna(subset=['latitude', 'longitude'])

# Step 8: Apply the distance calculation to the cleaned dataset
df_cleaned['distance_from_flinders'] = df_cleaned.apply(calculate_distance, axis=1)

# Step 9: Filter businesses within 1 km of Flinders Street
df_near_flinders = df_cleaned[df_cleaned['distance_from_flinders'] <= 1]

# Step 10: Group by industry for businesses near Flinders Street
if 'industry_anzsic4_description' in df.columns:
    industry_distribution_flinders = df_near_flinders.groupby('industry_anzsic4_description').size().reset_index(name='business_count')

    # Display the distribution
    print("Industry Distribution for Businesses near Flinders Street:\n", industry_distribution_flinders)

    # Step 11: Visualization - Horizontal bar chart for industries near Flinders Street
    plt.figure(figsize=(10, 6))

    # Sort data to display the top industries first (optional)
    industry_distribution_flinders = industry_distribution_flinders.sort_values(by='business_count', ascending=False).head(10)  # Top 10 industries

    # Create a horizontal bar chart
    plt.barh(industry_distribution_flinders['industry_anzsic4_description'], industry_distribution_flinders['business_count'], color='skyblue')

    # Customize the plot
    plt.title('Top 10 Industries for Businesses within 1 km of Flinders Street')
    plt.xlabel('Number of Businesses')
    plt.ylabel('Industry')
    plt.xticks(rotation=0)  # Ensure the x-axis labels are horizontal for readability

    # Show the chart
    plt.show()

else:
    print("Required column 'industry_anzsic4_description' is missing.")
The dataset contains 374210 records.
   census_year  block_id  property_id  base_property_id clue_small_area  \
0         2017       266       109851            109851         Carlton   
1         2017       266       109851            109851         Carlton   
2         2017       266       534003            534003         Carlton   
3         2017       266       664003            664003         Carlton   
4         2017       266       664005            664005         Carlton   

                   trading_name                          business_address  \
0  Metropoli's Research Pty Ltd  Level 1, 74 Victoria Street CARLTON 3053   
1             J Hong Restaurant  Ground , 74 Victoria Street CARLTON 3053   
2                  St2 Expresso           70 Victoria Street CARLTON 3053   
3            RMIT Resources Ltd           20 Cardigan Street CARLTON 3053   
4                        vacant           24 Cardigan Street CARLTON 3053   

   industry_anzsic4_code              industry_anzsic4_description  \
0                   6950  Market Research and Statistical Services   
1                   4511                     Cafes and Restaurants   
2                   4512                    Takeaway Food Services   
3                   8102                          Higher Education   
4                      0                              Vacant Space   

    longitude   latitude  
0  144.965352 -37.806701  
1  144.965352 -37.806701  
2  144.965473 -37.806714  
3  144.964753 -37.806312  
4  144.964772 -37.806203  
<ipython-input-4-62cba8cc4648>:58: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cleaned['distance_from_flinders'] = df_cleaned.apply(calculate_distance, axis=1)
Industry Distribution for Businesses near Flinders Street:
                           industry_anzsic4_description  business_count
0                                        Accommodation            1796
1                                  Accounting Services            3077
2          Adult, Community and Other Education n.e.c.             680
3                                 Advertising Services             910
4                       Aged Care Residential Services              24
..                                                 ...             ...
356                          Womens Footwear Retailing             621
357  Wooden Furniture and Upholstered Seat Manufact...              16
358  Wooden Structural Fitting and Component Manufa...               6
359                                   Wool Wholesaling              18
360         Zoological and Botanical Gardens Operation              23

[361 rows x 2 columns]
No description has been provided for this image
In [ ]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
from geopy.distance import geodesic

# Step 1: Function to collect data from the Melbourne Open Data portal
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # Fetch all records
        'lang': 'en',
        'timezone': 'UTC',
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # Read the CSV content
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')
        return None

# Step 2: Set dataset_id for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'

# Step 3: Collect data and store in df
df = collect_data(dataset_id)

# Step 4: Proceed if data was loaded successfully
if df is not None:
    print(f'The dataset contains {len(df)} records.')
    print(df.head(5))  # Display the first 5 rows to inspect the data
else:
    print("Failed to retrieve data.")

# Step 5: Define the Chapel Street coordinates (hard-coded)
chapel_street_coords = (-37.8517106, 144.9939362)  # Latitude and Longitude of Chapel Street

# Step 6: Function to calculate the distance from Chapel Street
def calculate_distance(row):
    business_coords = (row['latitude'], row['longitude'])
    return geodesic(business_coords, chapel_street_coords).km

# Step 7: Drop rows with missing latitude or longitude
df_cleaned = df.dropna(subset=['latitude', 'longitude'])

# Step 8: Apply the distance calculation to the cleaned dataset
df_cleaned['distance_from_chapel'] = df_cleaned.apply(calculate_distance, axis=1)

# Step 9: Filter businesses within 3 km of Chapel Street
df_near_chapel = df_cleaned[df_cleaned['distance_from_chapel'] <= 3]

# Step 10: Group by industry for businesses near Chapel Street
if 'industry_anzsic4_description' in df.columns:
    industry_distribution_chapel = df_near_chapel.groupby('industry_anzsic4_description').size().reset_index(name='business_count')

    # Display the distribution
    print("Industry Distribution for Businesses near Chapel Street:\n", industry_distribution_chapel)

    # Step 11: Visualization - Horizontal bar chart for industries near Chapel Street
    plt.figure(figsize=(10, 6))

    # Sort data to display the top industries first (optional)
    industry_distribution_chapel = industry_distribution_chapel.sort_values(by='business_count', ascending=False).head(10)  # Top 10 industries

    # Create a horizontal bar chart
    plt.barh(industry_distribution_chapel['industry_anzsic4_description'], industry_distribution_chapel['business_count'], color='skyblue')

    # Customize the plot
    plt.title('Top 10 Industries for Businesses within 3 km of Chapel Street')
    plt.xlabel('Number of Businesses')
    plt.ylabel('Industry')
    plt.xticks(rotation=0)  # Ensure the x-axis labels are horizontal for readability

    # Show the chart
    plt.show()

else:
    print("Required column 'industry_anzsic4_description' is missing.")
The dataset contains 374210 records.
   census_year  block_id  property_id  base_property_id clue_small_area  \
0         2017       266       109851            109851         Carlton   
1         2017       266       109851            109851         Carlton   
2         2017       266       534003            534003         Carlton   
3         2017       266       664003            664003         Carlton   
4         2017       266       664005            664005         Carlton   

                   trading_name                          business_address  \
0  Metropoli's Research Pty Ltd  Level 1, 74 Victoria Street CARLTON 3053   
1             J Hong Restaurant  Ground , 74 Victoria Street CARLTON 3053   
2                  St2 Expresso           70 Victoria Street CARLTON 3053   
3            RMIT Resources Ltd           20 Cardigan Street CARLTON 3053   
4                        vacant           24 Cardigan Street CARLTON 3053   

   industry_anzsic4_code              industry_anzsic4_description  \
0                   6950  Market Research and Statistical Services   
1                   4511                     Cafes and Restaurants   
2                   4512                    Takeaway Food Services   
3                   8102                          Higher Education   
4                      0                              Vacant Space   

    longitude   latitude  
0  144.965352 -37.806701  
1  144.965352 -37.806701  
2  144.965473 -37.806714  
3  144.964753 -37.806312  
4  144.964772 -37.806203  
<ipython-input-6-68c0ec4b7f21>:58: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cleaned['distance_from_chapel'] = df_cleaned.apply(calculate_distance, axis=1)
Industry Distribution for Businesses near Chapel Street:
                     industry_anzsic4_description  business_count
0                                  Accommodation             167
1                            Accounting Services             150
2    Adult, Community and Other Education n.e.c.              20
3                           Advertising Services             123
4                 Aged Care Residential Services              31
..                                           ...             ...
181        Waste Treatment and Disposal Services               3
182                      Water Freight Transport              22
183   Wired Telecommunications Network Operation              26
184                    Womens Clothing Retailing               2
185   Zoological and Botanical Gardens Operation              50

[186 rows x 2 columns]
No description has been provided for this image